Semi-supervised object detection (SSOD) aims to boost detection performance by leveraging extra unlabeled data. The teacher-student framework has been shown to be promising for SSOD, in which a teacher network generates pseudo-labels for unlabeled data to assist the training of a student network. Since the pseudo-labels are noisy, filtering the pseudo-labels is crucial to exploit the potential of such framework. Unlike existing suboptimal methods, we propose a two-step pseudo-label filtering for the classification and regression heads in a teacher-student framework. For the classification head, OCL (Object-wise Contrastive Learning) regularizes the object representation learning that utilizes unlabeled data to improve pseudo-label filtering by enhancing the discriminativeness of the classification score. This is designed to pull together objects in the same class and push away objects from different classes. For the regression head, we further propose RUPL (Regression-Uncertainty-guided Pseudo-Labeling) to learn the aleatoric uncertainty of object localization for label filtering. By jointly filtering the pseudo-labels for the classification and regression heads, the student network receives better guidance from the teacher network for object detection task. Experimental results on Pascal VOC and MS-COCO datasets demonstrate the superiority of our proposed method with competitive performance compared to existing methods.
translated by 谷歌翻译
单眼3D对象检测对于自动驾驶具有重要意义,但仍然具有挑战性。核心挑战是在没有明确深度信息的情况下预测对象的距离。与在大多数现有方法中将距离作为单个变量回归不同,我们提出了一种基于几何几何距离的分解,以通过其因子恢复距离。分解因素因物体到最具代表性和稳定的变量的距离,即图像平面中的物理高度和投影视觉高度。此外,该分解保持了两个高度之间的自我矛盾,当两个预测高度不准确时,导致距离的距离预测可靠。分解还使我们能够追踪不同场景的距离不确定性的原因。这种分解使距离预测可解释,准确和稳健。我们的方法直接通过紧凑的体系结构从RGB图像中预测3D边界框,从而使训练和推理简单有效。实验结果表明,我们的方法在单眼3D对象检测上实现了最先进的性能,而鸟类视图Kitti数据集的眼睛视图任务,并且可以推广到具有不同摄像机内在的图像。
translated by 谷歌翻译
旋转不变的面部检测,即用任意旋转平面(RIP)角度的检测面,广泛需要在无约束的应用中被广泛地需要,但由于面部出现的较大变化,仍然仍然是一个具有挑战性的任务。大多数现有方法符合速度或准确性以处理大的撕裂变体。为了更有效地解决这个问题,我们提出了逐步校准网络(PCN)以粗略的方式执行旋转不变的面部检测。 PCN由三个阶段组成,每个阶段不仅将面与非面孔区分开,而且还校准了每个面部候选者的RIP方向逐渐直立。通过将校准过程划分为几个渐进步骤,并且仅预测早期阶段中的粗定向,PCN可以实现精确且快速校准。通过对脸部与逐渐减小的RIP范围进行二进制分类,PCN可以准确地检测满360 ^ {\ rIC} $ RIP角度的面部。这种设计导致实时旋转不变面检测器。在野外的多面向FDDB的实验和疯狂旋转面的较宽面的具有挑战性的子集表明我们的PCN实现了非常有希望的性能。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.
translated by 谷歌翻译
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.
translated by 谷歌翻译
The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.
translated by 谷歌翻译